This notebook demonstrates the full lifecycle of machine learning model development based on the COVID-19 patienet case that would require admission to the ICU, that is also one of the Kaggle's competition. The notebook consists of the following main sections:
This project's main objective was to develop a machine learning solution for determining if a patient will be admitted to the intensive care unit. This is done with the hopes that ICU resources can be set up, patient transfers can be planned, frontline doctors may safely discharge these patients, and remote follow-up can be done with them.
The project was concerned with Covid-19 cases, with more than 16 million confirmed illnesses and 454,429 confirmed deaths by May 26, 2021, Brazil is one of the nations most impacted by the COVID-19 pandemic (according to the Johns Hopkins Coronavirus Resource Center). Brazil was one of the nations most affected by the first wave of Covid-19, which had its first case on February 26, 2020 and began spreading in communities on March 20, 2020, leaving Brazil unprepared and unable to respond because of the pressure on hospital capacity, including the lengthy and intense requests for ICU (incentive care unit) beds, staff, personal protection equipment, and medical resources.
This dataset, which comprises of 1925 rows and 231 columns, was collected using anonymized data from the Hospital Sri-Libanês, So Paulo, and Brasilia. PATIENT VISIT IDENTIFIER, which includes 385 patients with various 5 rows of records, is the dataset's unique identifier. The target is the ICU columns, which indicates whether the patient was admitted or not using a binary representation of 1 for admission and 0 for non-admission, and the Window column indicate the time-period the patient was admitted. Other notable features in this dataset
It was observed that 195 patients were admitted to ICU and 190 patients were not admitted
The dataset was transpose to convert the unique identifier to one row per patient. Hence, converting the data to 385 rows and 1151 columns
We have 223,863 missing cells (50.3%), and some features having over 89% missing data Columns with over 50% overrepresentation of null values were dropped. The other missing values were imputed with backward fill as suggested by the dataset providers. This downsized the dataset to 230 columns and 394 rows, as one of the patient visit identifier has a complete missing values in the entire row. As instructed by the dataset providers, that the data gathered after the ICU admission should not be taken into consideration, and patient admitted in window_1 (0-2Hours). Hence, our dataset was reduced further to 352rows and 50 columns after eliminating this rows and columns During that exploration the dataset was divided into patient-constant features (patient constant features are features that contain the same value for a single patient across all time points) and time-variant features (Time-variant features are features that contain multiple values for the same patient, such as multiple lab test results for a single patient over time) for better visualization According to the patient-constant-features, more males and patients over the age of 65 were admitted to the ICU, while the time-variant-features were able to show correlation with the DIFF cluster and DIFF_REL.
Correlation Checking the correlation with visualization wasn’t helpful because of large data, the data was split into two, first we are looked at how these features relate to each other, excluding the target column, and a stacked format was adopted to better understand the correlation

Secondly, we looked at how these features relate to ICU

Feature Encoding Feature encoding was performed on the Age percentile has it consist of columns with object.
The dataset was split into train and validation, 90% of the data was allocated to the train data and 10% to the validation. Ensemble Learning Methods was implemented using 8 modelling algorith, some of the results are particularly outstanding, especially given that 46% of genuine values in our desired frequency. A perfect model will have an ROC-AUC score of 1, while a model that is no better than random guessing will have an ROC-AUC score of 0.5. Hence, algorithm such as KNN, Decision Tree, SVM have low ROC-AUC score The results are significantly better on Random-Forest. This is an excellent sign because some of the models wasn't significantly more accurate than picking patients at random to be admitted to the intensive care unit Cross-validation has several advantages over other methods for evaluating the performance of a model. For example, it can provide a more accurate estimate of the model's performance because it uses more of the data for training and evaluation. It can also be used to tune hyper-parameters, which are model-specific parameters that cannot be learned from the data.

Hyper-parameters.
Feature selection and hyper-parameters was performed on the Random-Forest model to better enhanced the accuracy
We were able to enhance the validation accuracy by around 6% just by adjusting the algorithm's settings. Additionally, we have reached a point where our model should be able to forecast patients who will need an ICU bed in more than 80% of situations.
We were able to develop prediction models on this notebook for the ICU admission classification issue. For each patient, it concentrated on the earliest data available, creating a model that was reasonably accurate. The model's ability to successfully categorise patients for both goal values is one sign that this data processing stage was successful.
We must reiterate our caution that working with tiny datasets restricts how confident we can be in our findings.
import math
import pandas as pd
import numpy as np
import pandas_profiling as pdp
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score
from sklearn.model_selection import train_test_split, cross_validate, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process.kernels import RBF
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from sklearn.utils import resample
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score
from sklearn.metrics import precision_score, recall_score, roc_auc_score
warnings.filterwarnings('ignore')
import missingno as msno
from natsort import index_natsorted
# Plotly graphic library
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import missingno as msno
%matplotlib inline
pd.set_option('display.max_columns', 100)
#I converted the dataset to CSV from Excel file
data = pd.read_csv("ICU_Prediction.csv")
The first thing we did was to convert our dataset to CSV, then load our data.
Here is an excerpt of the the data description for the competition:
Available data includes- Patient demographic information (03), Patient previous grouped diseases (09), Blood results (36), Vital signs (06)
The WINDOW columns signifies the time frame patient was transferred to ICU and it has 5 time slot.
Let's have a first peek at the dataset first and last rows to confirm all of this.
data.head()
| PATIENT_VISIT_IDENTIFIER | AGE_ABOVE65 | AGE_PERCENTIL | GENDER | DISEASE GROUPING 1 | DISEASE GROUPING 2 | DISEASE GROUPING 3 | DISEASE GROUPING 4 | DISEASE GROUPING 5 | DISEASE GROUPING 6 | HTN | IMMUNOCOMPROMISED | OTHER | ALBUMIN_MEDIAN | ALBUMIN_MEAN | ALBUMIN_MIN | ALBUMIN_MAX | ALBUMIN_DIFF | BE_ARTERIAL_MEDIAN | BE_ARTERIAL_MEAN | BE_ARTERIAL_MIN | BE_ARTERIAL_MAX | BE_ARTERIAL_DIFF | BE_VENOUS_MEDIAN | BE_VENOUS_MEAN | BE_VENOUS_MIN | BE_VENOUS_MAX | BE_VENOUS_DIFF | BIC_ARTERIAL_MEDIAN | BIC_ARTERIAL_MEAN | BIC_ARTERIAL_MIN | BIC_ARTERIAL_MAX | BIC_ARTERIAL_DIFF | BIC_VENOUS_MEDIAN | BIC_VENOUS_MEAN | BIC_VENOUS_MIN | BIC_VENOUS_MAX | BIC_VENOUS_DIFF | BILLIRUBIN_MEDIAN | BILLIRUBIN_MEAN | BILLIRUBIN_MIN | BILLIRUBIN_MAX | BILLIRUBIN_DIFF | BLAST_MEDIAN | BLAST_MEAN | BLAST_MIN | BLAST_MAX | BLAST_DIFF | CALCIUM_MEDIAN | CALCIUM_MEAN | ... | TTPA_MAX | TTPA_DIFF | UREA_MEDIAN | UREA_MEAN | UREA_MIN | UREA_MAX | UREA_DIFF | DIMER_MEDIAN | DIMER_MEAN | DIMER_MIN | DIMER_MAX | DIMER_DIFF | BLOODPRESSURE_DIASTOLIC_MEAN | BLOODPRESSURE_SISTOLIC_MEAN | HEART_RATE_MEAN | RESPIRATORY_RATE_MEAN | TEMPERATURE_MEAN | OXYGEN_SATURATION_MEAN | BLOODPRESSURE_DIASTOLIC_MEDIAN | BLOODPRESSURE_SISTOLIC_MEDIAN | HEART_RATE_MEDIAN | RESPIRATORY_RATE_MEDIAN | TEMPERATURE_MEDIAN | OXYGEN_SATURATION_MEDIAN | BLOODPRESSURE_DIASTOLIC_MIN | BLOODPRESSURE_SISTOLIC_MIN | HEART_RATE_MIN | RESPIRATORY_RATE_MIN | TEMPERATURE_MIN | OXYGEN_SATURATION_MIN | BLOODPRESSURE_DIASTOLIC_MAX | BLOODPRESSURE_SISTOLIC_MAX | HEART_RATE_MAX | RESPIRATORY_RATE_MAX | TEMPERATURE_MAX | OXYGEN_SATURATION_MAX | BLOODPRESSURE_DIASTOLIC_DIFF | BLOODPRESSURE_SISTOLIC_DIFF | HEART_RATE_DIFF | RESPIRATORY_RATE_DIFF | TEMPERATURE_DIFF | OXYGEN_SATURATION_DIFF | BLOODPRESSURE_DIASTOLIC_DIFF_REL | BLOODPRESSURE_SISTOLIC_DIFF_REL | HEART_RATE_DIFF_REL | RESPIRATORY_RATE_DIFF_REL | TEMPERATURE_DIFF_REL | OXYGEN_SATURATION_DIFF_REL | WINDOW | ICU | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 60th | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.086420 | -0.230769 | -0.283019 | -0.593220 | -0.285714 | 0.736842 | 0.086420 | -0.230769 | -0.283019 | -0.586207 | -0.285714 | 0.736842 | 0.237113 | 0.0000 | -0.162393 | -0.500000 | 0.208791 | 0.898990 | -0.247863 | -0.459459 | -0.432836 | -0.636364 | -0.420290 | 0.736842 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 0-2 | 0 |
| 1 | 0 | 1 | 60th | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.333333 | -0.230769 | -0.132075 | -0.593220 | 0.535714 | 0.578947 | 0.333333 | -0.230769 | -0.132075 | -0.586207 | 0.535714 | 0.578947 | 0.443299 | 0.0000 | -0.025641 | -0.500000 | 0.714286 | 0.838384 | -0.076923 | -0.459459 | -0.313433 | -0.636364 | 0.246377 | 0.578947 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 2-4 | 0 |
| 2 | 0 | 1 | 60th | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.605263 | 0.605263 | 0.605263 | 0.605263 | -1.0 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.0 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.0 | -0.317073 | -0.317073 | -0.317073 | -0.317073 | -1.0 | -0.317073 | -0.317073 | -0.317073 | -0.317073 | -1.0 | -0.938950 | -0.938950 | -0.938950 | -0.938950 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 0.183673 | 0.183673 | ... | -0.825613 | -1.0 | -0.836145 | -0.836145 | -0.836145 | -0.836145 | -1.0 | -0.994912 | -0.994912 | -0.994912 | -0.994912 | -1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4-6 | 0 |
| 3 | 0 | 1 | 60th | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -0.107143 | 0.736842 | NaN | NaN | NaN | NaN | -0.107143 | 0.736842 | NaN | NaN | NaN | NaN | 0.318681 | 0.898990 | NaN | NaN | NaN | NaN | -0.275362 | 0.736842 | NaN | NaN | NaN | NaN | -1.000000 | -1.000000 | NaN | NaN | NaN | NaN | -1.000000 | -1.000000 | 6-12 | 0 |
| 4 | 0 | 1 | 60th | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -1.0 | -0.871658 | -0.871658 | -0.871658 | -0.871658 | -1.0 | -0.863874 | -0.863874 | -0.863874 | -0.863874 | -1.0 | -0.317073 | -0.317073 | -0.317073 | -0.317073 | -1.0 | -0.414634 | -0.414634 | -0.414634 | -0.414634 | -1.0 | -0.979069 | -0.979069 | -0.979069 | -0.979069 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 0.326531 | 0.326531 | ... | -0.846633 | -1.0 | -0.836145 | -0.836145 | -0.836145 | -0.836145 | -1.0 | -0.996762 | -0.996762 | -0.996762 | -0.996762 | -1.0 | -0.243021 | -0.338537 | -0.213031 | -0.317859 | 0.033779 | 0.665932 | -0.283951 | -0.376923 | -0.188679 | -0.379310 | 0.035714 | 0.631579 | -0.340206 | -0.4875 | -0.572650 | -0.857143 | 0.098901 | 0.797980 | -0.076923 | 0.286486 | 0.298507 | 0.272727 | 0.362319 | 0.947368 | -0.33913 | 0.325153 | 0.114504 | 0.176471 | -0.238095 | -0.818182 | -0.389967 | 0.407558 | -0.230462 | 0.096774 | -0.242282 | -0.814433 | ABOVE_12 | 1 |
5 rows × 231 columns
data.tail()
| PATIENT_VISIT_IDENTIFIER | AGE_ABOVE65 | AGE_PERCENTIL | GENDER | DISEASE GROUPING 1 | DISEASE GROUPING 2 | DISEASE GROUPING 3 | DISEASE GROUPING 4 | DISEASE GROUPING 5 | DISEASE GROUPING 6 | HTN | IMMUNOCOMPROMISED | OTHER | ALBUMIN_MEDIAN | ALBUMIN_MEAN | ALBUMIN_MIN | ALBUMIN_MAX | ALBUMIN_DIFF | BE_ARTERIAL_MEDIAN | BE_ARTERIAL_MEAN | BE_ARTERIAL_MIN | BE_ARTERIAL_MAX | BE_ARTERIAL_DIFF | BE_VENOUS_MEDIAN | BE_VENOUS_MEAN | BE_VENOUS_MIN | BE_VENOUS_MAX | BE_VENOUS_DIFF | BIC_ARTERIAL_MEDIAN | BIC_ARTERIAL_MEAN | BIC_ARTERIAL_MIN | BIC_ARTERIAL_MAX | BIC_ARTERIAL_DIFF | BIC_VENOUS_MEDIAN | BIC_VENOUS_MEAN | BIC_VENOUS_MIN | BIC_VENOUS_MAX | BIC_VENOUS_DIFF | BILLIRUBIN_MEDIAN | BILLIRUBIN_MEAN | BILLIRUBIN_MIN | BILLIRUBIN_MAX | BILLIRUBIN_DIFF | BLAST_MEDIAN | BLAST_MEAN | BLAST_MIN | BLAST_MAX | BLAST_DIFF | CALCIUM_MEDIAN | CALCIUM_MEAN | ... | TTPA_MAX | TTPA_DIFF | UREA_MEDIAN | UREA_MEAN | UREA_MIN | UREA_MAX | UREA_DIFF | DIMER_MEDIAN | DIMER_MEAN | DIMER_MIN | DIMER_MAX | DIMER_DIFF | BLOODPRESSURE_DIASTOLIC_MEAN | BLOODPRESSURE_SISTOLIC_MEAN | HEART_RATE_MEAN | RESPIRATORY_RATE_MEAN | TEMPERATURE_MEAN | OXYGEN_SATURATION_MEAN | BLOODPRESSURE_DIASTOLIC_MEDIAN | BLOODPRESSURE_SISTOLIC_MEDIAN | HEART_RATE_MEDIAN | RESPIRATORY_RATE_MEDIAN | TEMPERATURE_MEDIAN | OXYGEN_SATURATION_MEDIAN | BLOODPRESSURE_DIASTOLIC_MIN | BLOODPRESSURE_SISTOLIC_MIN | HEART_RATE_MIN | RESPIRATORY_RATE_MIN | TEMPERATURE_MIN | OXYGEN_SATURATION_MIN | BLOODPRESSURE_DIASTOLIC_MAX | BLOODPRESSURE_SISTOLIC_MAX | HEART_RATE_MAX | RESPIRATORY_RATE_MAX | TEMPERATURE_MAX | OXYGEN_SATURATION_MAX | BLOODPRESSURE_DIASTOLIC_DIFF | BLOODPRESSURE_SISTOLIC_DIFF | HEART_RATE_DIFF | RESPIRATORY_RATE_DIFF | TEMPERATURE_DIFF | OXYGEN_SATURATION_DIFF | BLOODPRESSURE_DIASTOLIC_DIFF_REL | BLOODPRESSURE_SISTOLIC_DIFF_REL | HEART_RATE_DIFF_REL | RESPIRATORY_RATE_DIFF_REL | TEMPERATURE_DIFF_REL | OXYGEN_SATURATION_DIFF_REL | WINDOW | ICU | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1920 | 384 | 0 | 50th | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.012346 | -0.292308 | 0.056604 | -0.525424 | 0.535714 | 0.789474 | 0.012346 | -0.292308 | 0.056604 | -0.517241 | 0.535714 | 0.789474 | 0.175258 | -0.050 | 0.145299 | -0.428571 | 0.714286 | 0.919192 | -0.299145 | -0.502703 | -0.164179 | -0.575758 | 0.246377 | 0.789474 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 0-2 | 0 |
| 1921 | 384 | 0 | 50th | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.605263 | 0.605263 | 0.605263 | 0.605263 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -0.717277 | -0.717277 | -0.717277 | -0.717277 | -1.0 | -0.317073 | -0.317073 | -0.317073 | -0.317073 | -1.0 | -0.170732 | -0.170732 | -0.170732 | -0.170732 | -1.0 | -0.982208 | -0.982208 | -0.982208 | -0.982208 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 0.244898 | 0.244898 | ... | -0.869210 | -1.0 | -0.879518 | -0.879518 | -0.879518 | -0.879518 | -1.0 | -0.979571 | -0.979571 | -0.979571 | -0.979571 | -1.0 | 0.086420 | -0.384615 | -0.113208 | -0.593220 | 0.142857 | 0.578947 | 0.086420 | -0.384615 | -0.113208 | -0.586207 | 0.142857 | 0.578947 | 0.237113 | -0.125 | -0.008547 | -0.500000 | 0.472527 | 0.838384 | -0.247863 | -0.567568 | -0.298507 | -0.636364 | -0.072464 | 0.578947 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 2-4 | 0 |
| 1922 | 384 | 0 | 50th | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.086420 | -0.230769 | -0.169811 | -0.593220 | 0.142857 | 0.736842 | 0.086420 | -0.230769 | -0.169811 | -0.586207 | 0.142857 | 0.736842 | 0.237113 | 0.000 | -0.059829 | -0.500000 | 0.472527 | 0.898990 | -0.247863 | -0.459459 | -0.343284 | -0.636364 | -0.072464 | 0.736842 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 4-6 | 0 |
| 1923 | 384 | 0 | 50th | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.209877 | -0.384615 | -0.188679 | -0.661017 | 0.285714 | 0.473684 | 0.209877 | -0.384615 | -0.188679 | -0.655172 | 0.285714 | 0.473684 | 0.340206 | -0.125 | -0.076923 | -0.571429 | 0.560440 | 0.797980 | -0.162393 | -0.567568 | -0.358209 | -0.696970 | 0.043478 | 0.473684 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 6-12 | 0 |
| 1924 | 384 | 0 | 50th | 1 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.605263 | 0.605263 | 0.605263 | 0.605263 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.0 | -0.317073 | -0.317073 | -0.317073 | -0.317073 | -1.0 | -0.317073 | -0.317073 | -0.317073 | -0.317073 | -1.0 | -0.983255 | -0.983255 | -0.983255 | -0.983255 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 0.306122 | 0.306122 | ... | -0.846633 | -1.0 | -0.807229 | -0.807229 | -0.807229 | -0.807229 | -1.0 | -0.888448 | -0.888448 | -0.888448 | -0.888448 | -1.0 | -0.185185 | -0.539103 | -0.107704 | -0.610169 | 0.050595 | 0.662281 | -0.160494 | -0.538462 | -0.075472 | -0.586207 | 0.071429 | 0.631579 | -0.175258 | -0.375 | -0.247863 | -0.785714 | 0.186813 | 0.777778 | -0.247863 | -0.470270 | -0.149254 | -0.515152 | 0.101449 | 0.842105 | -0.652174 | -0.644172 | -0.633588 | -0.647059 | -0.547619 | -0.838384 | -0.701863 | -0.585967 | -0.763868 | -0.612903 | -0.551337 | -0.835052 | ABOVE_12 | 0 |
5 rows × 231 columns
Observation:
print("Dataset contains (rows, cols):",data.shape)
Dataset contains (rows, cols): (1925, 231)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1925 entries, 0 to 1924 Columns: 231 entries, PATIENT_VISIT_IDENTIFIER to ICU dtypes: float64(225), int64(4), object(2) memory usage: 3.4+ MB
Observation:
profile_data = pdp.ProfileReport(data,
minimal = True,
explorative=True,
title = 'ProfilingResults',
progress_bar=True)
profile_data
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]